Finish Accumulators: a Deterministic Reduction Construct for Dynamic Task Parallelism
نویسندگان
چکیده
Parallel reductions represent a common pattern for computing the aggregation of an associative and commutative operation, such as summation, across multiple pieces of data supplied by parallel tasks. In this paper, we introduce finish accumulators, a unified construct that supports predefined and user-defined deterministic reductions for dynamic async-finish task parallelism. Finish accumulators are designed to be integrated into terminally strict models of task parallelism as in the X10 and Habanero-Java (HJ) languages, which is more general than fully strict models of task parallelism found in Cilk and OpenMP. In contrast to lower-level reduction constructs such as atomic variables, the high-level semantics of finish accumulators allows for a wide range of implementations with different accumulation policies, e.g., eager-computation vs. lazycomputation. The best implementation can thus be selected based on a given application and the target platform that it will execute on. We have integrated finish accumulators into the Habanero-Java task parallel language, and used them in both research and teaching. In addition to their higherlevel semantics, experimental results demonstrate that our Java-based implementation of finish accumulators delivers comparable or better performance for reductions relative to Java’s atomic variables and concurrent collection libraries.
منابع مشابه
COMP 322: Fundamentals of Parallel Programming Module 1: Deterministic Shared-Memory Parallelism
1 Task-level Parallelism 6 1.1 Task Creation and Termination (Async, Finish) . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Computation Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Ideal Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4 Multiprocessor Scheduling . . . . . . . . . ...
متن کاملCompiler Support for Work-Stealing Parallel Runtime Systems
Multiple programming models are emerging to address an increased need for dynamic task parallelism in multicore shared-memory multiprocessors. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Threading Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work-stealing, as embodied in Cilk’s implementation of dynamic spaw...
متن کاملWork-First and Help-First Scheduling Policies for Terminally Strict Parallel Programs
Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-addressspace parallel computing. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Thread Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work stealing, as embodied in...
متن کاملPhaser Beams: Integrating Stream Parallelism with Task Parallelism
Current streaming languages place significant restrictions on the structure of parallelism that they support, and usually do not allow for dynamic task parallelism. In contrast, there are a number of task-parallel programming models that support dynamic parallelism but lack the ability to set up efficient streaming communications among dynamically varying sets of tasks. We address this gap by i...
متن کاملDynamic Task Parallelism with a GPU Work-Stealing Runtime System
NVIDIA’s Compute Unified Device Architecture (CUDA) and its attached C/C++ based API went a long way towards making GPUs more accessible to mainstream programming. So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason. The high number of computational cores and high memory bandwidth supported by the device makes ...
متن کامل